Acquisition of Large Scale Categorial Grammar Lexicons
نویسندگان
چکیده
A system is presented for inducing Categorial Grammar (CG) lexicons for natural language from either unannotated or minimally annotated corpora extracted from the Penn Treebank. A combination of symbolic and stochastic methods have been used to build a computationally e ective and psychologically plausible system, which learns linguistically useful lexicons. There are a variety of parameters in the system, including the corpus annotation used, the knowledge given to the learner and the weight given to the symbolic and stochastic methods. We present results from a set of experiments that investigate these parameters. The results also show that the system performs well even when compared with systems used for simpler problems.
منابع مشابه
PhD Proposal – The Lexicon in Combinatory Categorial Grammar: An Explanatory Theory of Verbal Categories in Natural Languages
The aim of this project is to elaborate a theory of natural language lexicons for Combinatory Categorial Grammar (CCG), a mildly contextsensitive, polynomially time-parsable variant of categorial grammar. This theory will have both a descriptive aspect, exploring the use of appropriate formal machinery for expressing lexical generalisations, and an explanatory aspect, accounting for observed pa...
متن کاملAn inheritance-based theory of the lexicon in combinatory categorial grammar
This thesis proposes an extended version of the Combinatory Categorial Grammar (CCG) formalism, with the following features: 1. grammars incorporate inheritance hierarchies of lexical types, defined over a simple, feature-based constraint language 2. CCG lexicons are, or at least can be, functions from forms to these lexical types This formalism, which I refer to as ‘inheritance-driven’ CCG (I-...
متن کاملLearning Compact Lexicons for CCG Semantic Parsing
We present methods to control the lexicon size when learning a Combinatory Categorial Grammar semantic parser. Existing methods incrementally expand the lexicon by greedily adding entries, considering a single training datapoint at a time. We propose using corpus-level statistics for lexicon learning decisions. We introduce voting to globally consider adding entries to the lexicon, and pruning ...
متن کاملAn HDP Model for Inducing Combinatory Categorial Grammars
We introduce a novel nonparametric Bayesian model for the induction of Combinatory Categorial Grammars from POS-tagged text. It achieves state of the art performance on a number of languages, and induces linguistically plausible lexicons.
متن کاملSemantic Bootstrapping of Type-Logical Grammar
A procedure is described which induces type-logical grammar lexicons from sentences annotated with skeletal terms of the simply typed lambda calculus. A generalized formulae-as-types correspondence is exploited to obtain all the typelogical proofs of the sample sentences from their lambda terms, and the resulting lexicons are then optimally unified, which effectively unifies the syntactic categ...
متن کامل